Serbo-croatian Hyphenation: a 'ijex Point of View
نویسنده
چکیده
Serbo-Croatian is one of the South-Slavic languages. It is characterized, as other Slavic languages, by a rich morphology. A particular feature of the language is its almost fully phonological orthography, i.e. on a word level, one letter corresponds to each phoneme and vice versa. As a result, the written text practically represents a phonemic transcription of speech. Still, the Serbo-Croatian literary language has two main pronunciations, ekavian and jekavian, which reflect the different development of the pronunciation of the old Slavic sound h. Sound h is usually replaced by vowel e in ekavian dialect (for instance, dete, mleko, veEan, ~ o v e k ) while in jekavian dialect it is usually replaced either by two-syllable group i j e (d i j e t e , mlijeko) or by one-syllable group j e ( v j e ~ a n , Eovjek). Those differences in pronunciation are recorded in the written text. Accent has a distinctive role in SerboCroatian and as it is not marked in written texts there is a number of homographs. Two alphabets are in use: Latin and Cyrillic. The Serbo-Croatian Latin alphabet is different from the English alphabet. Both letters with diacritics E, E, Z, g, d-and digraphs-d~, l j . n j -are in use and they all have a separate place in the alphabet. The order of the Serbo-Croatian Latin alphabet is therefore as follows: a, b. c. E. t, d, dZ, d, e and so on. As the letters q, w, x and y don't exist in the Serbo-Croatian alphabet, the total number of letters is 30. Transcription of foreign words and names is compulsory in SerboCroatian of ekavian pronunciation while jekavian pronunciation allows the orthography of the source language. While all the letters with diacritics are assigned separate keys on the standardized national keyboard as well as the positions in the national version of 7-bit code [I, 2, 31, neither keys nor codes are provided for digraphs so they are input by striking two keys, i.e. by entering two codes. Besides that, although the standard provides a separate key for the letter d. the keyboards of old typewriters often did not have it. As a result, this letter was-and sometimes still is -recorded as the digraph d j . in spite of orthographic rules. Serbo-Croatian Cyrillic has the equivalent 30 letters but with neither diacritics nor digraphs. The order of the letters in the Serbo-Croatian Cyrillic alphabet is completely different from the order in the Latin alphabet. The Serbo-Croatian Cyrillic alphabet is also different from the Russian alphabet as there are letters which do not exist in Russian Cyrillic: 5, j , JL, I+, h, u, and vice versa, which is important as the Russian Cyrillic was the basis for the development of appropriate international coding standards. The digraphs of the Serbo-Croatian Latin alphabet can cause problems when using formatting and typesetting programs. particularly for hyphenation and automatic transcription from the Latin to the Cyrillic alphabet. These problems can be caused by each combination-lj, n j , dZ and dj-which in the text may represent both digraphs and consonant clusters. A digraph is always transcribed into one Cyrillic letter and is never hyphenated. For instance, nadZak-baba is transcribed into ~avax-6aGa and in both cases is hyphenated as na-dZak-ba-ba. On the other hand, a consonant cluster is always transcribed into two Cyrillic letters and can, in principle, be hyphenated. For instance, nadZiveti is transcribed into Ha,qxmem and is hyphenated as nad-Zi-ve-ti.
منابع مشابه
A model of the perception of Serbo-Croatian word tone
Purcell, 1979 presented data on the perception of Serbo-Croatian word tone by native speakers. The present paper develops a logistic regression model of the perception of Serbo-Croatian word tone using Purcell’s 1979 data. Two models are developed: an overall model and a two-part, split model. Model fits are calculated and plotted. The two-part model fits the perceptual data better. Model coeff...
متن کاملVisual Word Recognition in Serbo-croatian Is Necessarily Phonological
In a naming task conducted with bi-alphabetic readers of Serbo-Croatian. it was shown that letter strings that can be assigned both a Roman and a Cyrillic alphabet reading incur longer latencies than the unique alphabet transcription of the same word. and that the magnitude of the difference depended on the number of ambiguous characters in the ambiguous letter string. While this wi thin-word p...
متن کاملAutomatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
The paper presents the module for automatic generation of prosodic features of synthesized speech, namely, f0 targets and phonetic segment durations, within the speech synthesizer AlfaNumTTS, the most sophisticated speech synthesis system for Serbo-Croatian language to date. The module is based on regression trees trained on a studio recorded single speaker database of Serbo-Croatian. The datab...
متن کاملTranscribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaptation
This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive voca...
متن کاملStrategies for visual word recognition and orthographical depth: a multilingual comparison.
We investigated the psychological reality of the concept of orthographical depth and its influence on visual word recognition by examining naming performance in Hebrew, English, and Serbo-Croatian. We ran three sets of experiments in which we used native speakers and identical experimental methods in each language. Experiment 1 revealed that the lexical status of the stimulus (high-frequency wo...
متن کامل